(2.1.2) ILP History, Overview and Prespective

Joseph A. Fisher and B. Ramakrishna Rau. Instruction-Level Parallel Processing, Science, 13 September 1991, pp. 1233-1241. Science Link

1992

For ILP

The dependences between operations must be determined
The operations that are independent of any other operatation that has not as yet completed must be determined
These independent operations must be scheduled to execute at some particular time, on some specific funct unit, and must be assigned a register to which the result may be deposited.

Hardware and Software Techniques for ILP Exec

Pipelining / multiple functional units

+ ILP increase

- interconn network/reg files goes up as square of funct units.

Superpipelining > eg : int ALU is pipelined.

Superscalar : multiple instruction issue of seq programs

Horizontal microprogramming >

Compiler decides

VLIW issues

Problem : small size of basic blocks

Dynamic schemes for speculative execution

terminate unnec speculative computation once the branch has been resolved

undo the effects of operation that should not been executed

ensure no exceptions are reported

preserve enuf exec state to resume

Compile time > trace scheduling

Branch prediction to guide speculative scheduling

gather execution statistics

compile time profiling (Trace sched/superblock sched)

Dynamic branch predition ++

Predicted execution : to selectively squash.

ILP Compilation

Scheduling > NP complete problem

Only a problem on VLIW kind processors (non sequential)

Local Scheduling

sheduling barrier is assumed to exist between adjacent basic blocks in control flow graph

(Local code compaction)

+ computationally inexpensive (one-pass)

+ near optimal

Global Acyclic Scheduling

Trace Sched

compiler often can, by rearranging its generated machine instructions for faster execution, improve program performance. Trace scheduling is one of many known techniques for doing so.

Trace scheduling was originally developed for Very Long Instruction Word, or VLIW machines, and is a form of global code motion. It works by converting a loop to long straight-line code sequence using loop unrolling and static branch prediction. This process separates out "unlikely" code and adds handlers for exits from trace. The goal is to have the most common case executed as a sequential set of instructions without branches.

SuperBlock Schel

a simplified form of trace scheduling which does not attempt to merge control flow paths at trace "side entrances". Instead, code can be implemented by more than one schedule, vastly simplifying the code generator

Cyclic Scheduling

Software pipelining

Take loops, unroll them find repeating patterns and reroll them. modulo constraint : can be because of initiation interval or Resource constraints.

>> wiki : Instruction scheduling

+ adequate registers more important than optimizing schedule.

+ shorter schedules > higher register pressure.

- Antidependences and output dependences

prepass scheduling (sched before reg allocation)

graph coloring

best : schedule first, register allocate, shedule again.